The mechanism of existing style transfer algorithms is by minimizing a hybrid loss function to push the generated image toward high similarities in both content and style. However, this type of approach cannot guarantee visual fidelity, i.e., the generated artworks should be indistinguishable from real ones. In this paper, we devise a new style transfer framework called QuantArt for high visual-fidelity stylization. QuantArt pushes the latent representation of the generated artwork toward the centroids of the real artwork distribution with vector quantization. By fusing the quantized and continuous latent representations, QuantArt allows flexible control over the generated artworks in terms of content preservation, style similarity, and visual fidelity. Experiments on various style transfer settings show that our QuantArt framework achieves significantly higher visual fidelity compared with the existing style transfer methods.
translated by 谷歌翻译
The number of international benchmarking competitions is steadily increasing in various fields of machine learning (ML) research and practice. So far, however, little is known about the common practice as well as bottlenecks faced by the community in tackling the research questions posed. To shed light on the status quo of algorithm development in the specific field of biomedical imaging analysis, we designed an international survey that was issued to all participants of challenges conducted in conjunction with the IEEE ISBI 2021 and MICCAI 2021 conferences (80 competitions in total). The survey covered participants' expertise and working environments, their chosen strategies, as well as algorithm characteristics. A median of 72% challenge participants took part in the survey. According to our results, knowledge exchange was the primary incentive (70%) for participation, while the reception of prize money played only a minor role (16%). While a median of 80 working hours was spent on method development, a large portion of participants stated that they did not have enough time for method development (32%). 25% perceived the infrastructure to be a bottleneck. Overall, 94% of all solutions were deep learning-based. Of these, 84% were based on standard architectures. 43% of the respondents reported that the data samples (e.g., images) were too large to be processed at once. This was most commonly addressed by patch-based training (69%), downsampling (37%), and solving 3D analysis tasks as a series of 2D tasks. K-fold cross-validation on the training set was performed by only 37% of the participants and only 50% of the participants performed ensembling based on multiple identical models (61%) or heterogeneous models (39%). 48% of the respondents applied postprocessing steps.
translated by 谷歌翻译
现有的深度学习真正的denoising方法需要大量嘈杂的清洁图像对进行监督。尽管如此,捕获真正的嘈杂清洁数据集是一个不可接受的昂贵且繁琐的程序。为了减轻这个问题,这项工作研究了如何产生现实的嘈杂图像。首先,我们制定了一个简单而合理的噪声模型,该模型将每个真实嘈杂像素视为随机变量。该模型将嘈杂的图像生成问题分为两个子问题:图像域的比对和噪声域对齐。随后,我们提出了一个新颖的框架,即像素级噪声吸引的生成对抗网络(PNGAN)。 PNGAN使用预先训练的真实DeNoiser将伪造和真实的噪声图像映射到几乎无噪声的解决方案空间中,以执行图像域的对齐。同时,PNGAN建立了一个像素级对抗训练,以进行噪声域的比对。此外,为了获得更好的噪声拟合,我们提出了一个有效的体系结构简单的多尺度网络(SMNET)作为发电机。定性验证表明,就强度和分布而言,PNGAN产生的噪声与真实噪声高度相似。定量实验表明,一系列经过生成的嘈杂图像训练的Denoisers在四个真正的Denoising基准测试中获得了最新的(SOTA)结果。代码,预训练模型和结果的一部分可在https://github.com/caiyuanhao1998/pngan上获得比较。
translated by 谷歌翻译
我们展示了Pytorch Connectomics(Pytc),一个开源深度学习框架,用于体积显微镜图像的语义和实例分割,基于Pytorch。我们展示了Pytc在Connectomics领域的有效性,其旨在在纳米分辨率下进行线粒体,突触像Mitochondria这样的细胞器,以了解动物脑中的神经元通信,代谢和发育。 Pytc是一个可伸缩且灵活的工具箱,可以在不同的尺度上处理数据集,并支持多任务和半监督学习,以更好地利用昂贵的专家注释和培训期间的大量未标记数据。通过在不编码的情况下改变配置选项并且适用于不同组织和成像方式的其他2D和3D分段任务,可以在Pytc中容易地实现这些功能。定量方面,我们的框架在Cremi挑战中实现了突触裂缝分割的最佳性能(以相对6.1美元\%$)和线粒体和神经元核细胞分割的竞争性能。代码和教程在https://connectomics.readthedocs.io上公开提供。
translated by 谷歌翻译
视频实例分段旨在检测视频中的段和跟踪对象。电流接近将图像级分段算法扩展到时间域。然而,这导致时间上不一致的掩模。在这项工作中,我们由于性能瓶颈而导致的掩模质量。通过此激励,我们提出了一种视频实例分段方法,可以减轻由于缺失的检测而存在的问题。由于这不能简单地使用空间信息来解决,因此我们使用帧间关节来利用时间上下文。这允许我们的网络使用来自相邻帧的框预测来重新拍摄缺失的对象,从而克服丢失的检测。我们的方法通过在YouTube-Vis基准上实现35.1%的地图,显着优于先前最先进的算法。此外,我们的方法完全在线,不需要未来的框架。我们的代码在https://github.com/anirudh-chakravarthy/objprop上公开提供。
translated by 谷歌翻译
我们介绍了MedMnist V2,这是标准化生物医学图像的大规模MNIST样数据集集合,包括12个用于2D的数据集和3D的6个数据集。所有图像均已预处理成28x28(2D)或28x28x28(3d)的小尺寸,并带有相应的分类标签,因此用户不需要背景知识。涵盖生物医学图像中的主要数据模式,MedMnist V2旨在对具有各种数据集量表(从100到100,000)和多种任务(二进制/多级,序数回归和多标签)进行轻巧的2D和3D图像进行分类。最终的数据集由708,069个2D图像和10,214个3D图像组成,可以支持生物医学图像分析,计算机视觉和机器学习中的许多研究 /教育用途。我们在MedMnist V2上基准了几种基线方法,包括2D / 3D神经网络和开源 /商用汽车工具。数据和代码可在https://medmnist.com/上公开获取。
translated by 谷歌翻译
从显微镜图像体积分段3D细胞核对于生物学和临床分析至关重要,从而实现了细胞表达模式和细胞谱系的研究。然而,神经元核的当前数据集通常包含小于$ 10 ^ {\ text {-} 3} \ mm ^ 3 $的卷,每卷少于500美元,无法揭示大脑区域的复杂性并限制神经元的调查结构。在本文中,我们推动了向子立方毫米秤的任务向前推进了,并用两个完全注释的卷策划了NUCMM数据集:1美元\ mm ^ $电子显微镜(EM)含有几乎整个斑马鱼大脑,大约170,000左右核;还有0.25美元\ mm ^ 3 $ micro-ct(uct)卷,其中鼠标视觉皮层的一部分,大约7,000个核。具有两种成像模态,体积大小和实例数量显着增加,我们在外观和密度中发现了神经元核的大量多样性,对该领域引入了新的挑战。我们还进行统计分析以定量地说明这些挑战。为了解决挑战,我们提出了一种新颖的混合表示学习模型,该模型结合了前景掩模,轮廓图和签名距离变换来生产高质量的3D面罩。 NUCMM数据集上的基准比较表明,我们所提出的方法显着优于最先进的核细胞分割方法。代码和数据可在https://connectomics-bazaar.github.io/proj/nucmm/index.html中获得。
translated by 谷歌翻译
Many video enhancement algorithms rely on optical flow to register frames in a video sequence. Precise flow estimation is however intractable; and optical flow itself is often a sub-optimal representation for particular video processing tasks. In this paper, we propose task-oriented flow (TOFlow), a motion representation learned in a selfsupervised, task-specific manner. We design a neural network with a trainable motion estimation component and a video processing component, and train them jointly to learn the task-oriented flow. For evaluation, we build Vimeo-90K, a large-scale, high-quality video dataset for low-level video processing. TOFlow outperforms traditional optical flow on standard benchmarks as well as our Vimeo-90K dataset in three video processing tasks: frame interpolation, video denoising/deblocking, and video super-resolution. IntroductionMotion estimation is a key component in video processing tasks such as temporal frame interpolation, video denoising,
translated by 谷歌翻译
Masked image modeling (MIM) performs strongly in pre-training large vision Transformers (ViTs). However, small models that are critical for real-world applications cannot or only marginally benefit from this pre-training approach. In this paper, we explore distillation techniques to transfer the success of large MIM-based pre-trained models to smaller ones. We systematically study different options in the distillation framework, including distilling targets, losses, input, network regularization, sequential distillation, etc, revealing that: 1) Distilling token relations is more effective than CLS token- and feature-based distillation; 2) An intermediate layer of the teacher network as target perform better than that using the last layer when the depth of the student mismatches that of the teacher; 3) Weak regularization is preferred; etc. With these findings, we achieve significant fine-tuning accuracy improvements over the scratch MIM pre-training on ImageNet-1K classification, using all the ViT-Tiny, ViT-Small, and ViT-base models, with +4.2%/+2.4%/+1.4% gains, respectively. Our TinyMIM model of base size achieves 52.2 mIoU in AE20K semantic segmentation, which is +4.1 higher than the MAE baseline. Our TinyMIM model of tiny size achieves 79.6% top-1 accuracy on ImageNet-1K image classification, which sets a new record for small vision models of the same size and computation budget. This strong performance suggests an alternative way for developing small vision Transformer models, that is, by exploring better training methods rather than introducing inductive biases into architectures as in most previous works. Code is available at https://github.com/OliverRensu/TinyMIM.
translated by 谷歌翻译
Given the increasingly intricate forms of partial differential equations (PDEs) in physics and related fields, computationally solving PDEs without analytic solutions inevitably suffers from the trade-off between accuracy and efficiency. Recent advances in neural operators, a kind of mesh-independent neural-network-based PDE solvers, have suggested the dawn of overcoming this challenge. In this emerging direction, Koopman neural operator (KNO) is a representative demonstration and outperforms other state-of-the-art alternatives in terms of accuracy and efficiency. Here we present KoopmanLab, a self-contained and user-friendly PyTorch module of the Koopman neural operator family for solving partial differential equations. Beyond the original version of KNO, we develop multiple new variants of KNO based on different neural network architectures to improve the general applicability of our module. These variants are validated by mesh-independent and long-term prediction experiments implemented on representative PDEs (e.g., the Navier-Stokes equation and the Bateman-Burgers equation) and ERA5 (i.e., one of the largest high-resolution data sets of global-scale climate fields). These demonstrations suggest the potential of KoopmanLab to be considered in diverse applications of partial differential equations.
translated by 谷歌翻译